Similarity Multidimensional Indexing
نویسندگان
چکیده
The multidimensional k-NN (k nearest neighbors) query problem arises in a large variety of database applications, including information retrieval, natural language processing, and data mining. To solve it efficiently, database needs an indexing structure supporting this kind of search. However, exact solution is hardly feasible in multidimensional space. In this paper we describe and analyze an indexing technique for approximate solution of k-NN problem. Construction of the indexing tree is based on clustering. Construction of hash indexing is based on s-stable distributions. Indices are implemented on top of high-performance industrial DBMS.
منابع مشابه
Indexing Issues in Supporting Similarity Searching
Indexing issues that arise in the support of similarity searching are presented. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distance-based indexing, dimension reduction, and embedding methods.
متن کاملIndividual Study Option: Scalable Multimedia Database Indexing
Most image or video search engines operate similarity search by extracting and storing feature vectors from the multimedia objects. Thus, the similarity search is transformed into a search of points in the feature space that are close to a given query point in the high dimensional feature space. Multidimensional indexing structures are supposed to cut this process short and quickly return the m...
متن کاملAn Efficient Indexing Method for Box Queries in NDDS Spaces using BoND-tree
Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity searches require robust indexing techniques. Box queries (or window queries) are a type of query which specifies a set of allowed values in each dimension. Unfortunately, exi...
متن کاملIndexing Images with Multiple Regions
Abstract. Similarity indexing using Spatial Access Methods (SAMs) like e.g., R-trees, assumes that each data entity (or query) is represented by exactly one multidimensional point. However, for several applications, including indexing and retrieval of multimedia data like onedimensional signals and images, it is required that each data entity is represented by multiple points in a multidimensio...
متن کاملOn the effective clustering of multidimensional data sequences
In this paper, we investigate the problem of clustering multidimensional data sequences such as video streams. Each sequence is represented by a small number of hyper-rectangular clusters for subsequent indexing and similarity search processing. We present a linear clustering algorithm that guarantees the predefined level of clustering quality, and show its effectiveness via experiments on vari...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011